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Abstract. Conditional independence models in the Gaussian case are algebraic varieties in the 
cone of positive definite covariance matrices. We study these varieties in the case of Bayesian 
networks, with a view towards generalizing the recursive factorization theorem to situations with 
hidden variables. In the case when the underlying graph is a tree, we show that the vanishing 
ideal of the model is generated by the conditional independence statements implied by graph. 
We also show that the ideal of any Bayesian network is homogeneous with respect to a multi- 
grading induced by a collection of upstream random variables. This has a number of important 
consequences for hidden variable models. Finally, we relate the ideals of Bayesian networks to 
a number of classical constructions in algebraic geometry including toric degenerations of the 
Grassmannian, matrix Schubert varieties, and secant varieties. 



1. Introduction 

A Bayesian network or directed graphical model is a statistical model that uses a directed 
acyclic graph (DAG) to represent the conditional independence structures between collections 
of random variables. The word Bayesian is used to describe these models because the nodes 
in the graph can be used to represent random variables that correspond to parameters or hy- 
perparameters, though the basic models themselves are not a priori Bayesian. These models 
are used throughout computational statistics to model complex interactions between collections 
of random variables. For instance, tree models are used in computational biology for sequence 
alignment [4j and in phylogenetics [5l [15] . Special cases of Bayesian networks include familiar 
models from statistics like factor analysis [3] and the hidden Markov model [1]. 

The DAG that specifies the Bayesian network specifies the model in two ways. The first is 
through a recursive factorization of the parametrization, via restricted conditional distributions. 
The second method is via the conditional independence statements implied by the graph. The 
recursive factorization theorem |13| Thm 3.27] says that these two methods for specifying a 
Bayesian network yield the same family of probability density functions. 

When the underlying random variables are Gaussian or discrete, conditional independence 
statements can be interpreted as algebraic constraints on the parameter space of the global 
model. In the Gaussian case, this means that conditional independence corresponds to algebraic 
constraints on the cone of positive definite matrices. One of our main goals in this paper is to 
explore the recursive factorization theorem using algebraic techniques in the case of Gaussian 
random variables, with a view towards the case of hidden random variables. In this sense, the 
current paper is a generalization of the work began in [3] which concerned the special case of 
factor analysis. Some past work has been done on the algebraic geometry of Bayesian networks 
in the discrete case in [6l [7] , but there are many open questions that remain in both the Gaussian 
and the discrete case. 
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In the next section, we describe a combinatorial parametrization of a Bayesian network in 
the Gaussian case. In statistics, this parametrization in known as the trek rule |17j. We also 
describe the algebraic interpretation of conditional independence in the Gaussian case which 
leads us to our main problem: comparing the vanishing ideal of the model Iq to the conditional 
independence ideal Cq- Section |3] describes the results of computations regarding the ideals 
of Bayesian networks, and some algebraic conjectures that these computations suggest. In 
particular, we conjecture that the coordinate ring of a Bayesian network is always normal and 
Cohen-Macaulay. 

As a first application of our algebraic perspective on Gaussian Bayesian networks, we provide a 
new and greatly simplified proof of the tetrad representation theorem |17l Thm 6.10] in Section 
[4] Then in Section [5] we provide an extensive study of trees in the fully observed case. In 
particular, we prove that for any tree T, the ideal It is a toric ideal generated by linear forms 
and quadrics that correspond to conditional independence statements implied by T. Techniques 
from polyhedral geometry are used to show that C[T,]/It is always normal and Cohen-Macaulay. 

Sections |6] and [7] are concerned with the study of hidden variable models. In Section [6] we 
prove the Upstream Variables Theorem (Theorem |6.4[ ) which shows that Iq is homogeneous 
with respect to a two dimensional multigrading induced by upstream random variables. As 
a corollary, we deduce that hidden tree models are generated by tetrad constraints. Finally 
in Section [7] we show that models with hidden variables include, as special number 
of classical constructions from algebraic geometry. These include toric degenerations of the 
Grassmannian, matrix Schubert varieties, and secant varieties. 

Acknowledgments. I would like to thank Mathias Drton, Thomas Richardson, Mike Stillman, 
and Bernd Sturmfels for helpful comments and discussions about the results in this paper. The 
IMA provided funding and computer equipment while I worked on parts of this project. 

2. Parametrization and Conditional Independence 

Let G be a directed acyclic graph (DAG) with vertex set V{G) and edge set E{G). Often, 
we will assume that V{G) = [n] := {1, 2, . . . , n}. To guarantee the acyclic assumption, we 
assume that the vertices are numerically ordered; that is, i ^ j € E{G) only \i i < j. The 
Bayesian network associated to this graph can be specified by either a recursive factorization 
formula or by conditional independence statements. We focus first on the recursive factorization 
representation, and use it to derive an algebraic description of the parametrization. Then we 
introduce the conditional independence constraints that vanish on the model and the ideal that 
these constraints generate. 

Let X = (Xi,...,Xn) be a random vector, and let /(x) denote the probability density 
function of this random vector. Bayes' theorem says that this joint density can be factorized as 
a product 

n 

fix) = JJ/j(Xi|xi, . . .,Xi-l), 
i=l 

where fi{xi\xi, . . . , denotes the conditional density of Xi given Xi = xi, . . . , = Xi-i. 
The recursive factorization property of the graphical model is that each of the conditional 
densities fi{xi\xi, . . . ,Xi-i) only depends on the parents pa(i) = {j £ [n] \ j ^ i £ E{G)}. We 
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can rewrite this representation as 

fi{Xi\xi , . . . , = /i(3;j|Xpa_(j)) . 

Thus, a density function / belongs to the Bayesian network if it factorizes as 

n 

fix) =Y{fi{Xi\Xp^(i)). 
i=l 

To explore the consequences of this parametrization in the Gaussian case, we first need to 
recall some basic facts about Gaussian random variables. Each n-dimensional Gaussian random 
variable X is completely specified by its mean vector /j, and its positive definite covariance matrix 
S. Given these data, the joint density function of X is given by 

1 1 

•/■(^) = (2^)n/2|s|l/2 «^P(-2 - ^'f^~Hx - M)), 

where |S| is the determinant of S. Rather than writing out the density every time, the short- 
hand X ~ MilJ-, S) is used to indicate that X is a Gaussian random variable with mean n 
and covariance matrix S. The multivariate Gaussian generalizes the familiar "bell curve" of 
a univariate Gaussian and is an important distribution in probability theory and multivariate 
statistics because of the central limit theorem [1 . 

Given an n-dimensional random variable X and A C [n], let Xa = {Xa)aeA- Similarly, if 
X is a vector, then xa is the subvector indexed by A. For a matrix S, T,a,b is the submatrix 
of T, with row index set A and column index set B. Among the nice properties of Gaussian 
random variables are the fact that marginalization and conditioning both preserve the Gaussian 
property; see |T]. 

Lemma 2.1. Suppose that X ~ AA(/Lt, S) and let A,BC. [n] be disjoint. Then 

(1) Xa ^ ^f{^J■A,^A,A) "-^d 

(2) Xa\Xb = xb ^ ^fifJ-A + - hb),'^a,a - ^a,b'^b]b'^b,a)- 

To build the Gaussian Bayesian network associated to the DAG G, we allow any Gaussian con- 
ditional distribution for the distribution /(xi|xpa(i)). This conditional distribution is recovered 
by saying that 

iepa(j) 

where Wj ~ Afiujjipj) and is independent of the Xi with i < j, and the \ij are the regression 
parameters. Linear transformations of Gaussian random variables are Gaussian, and thus X is 
also a Gaussian random variable. Since X is completely specified by its mean fi and covariance 
matrix E, we must calculate these from the conditional distribution. The recursive expression 
for the distribution of Xj given the variables preceding it yields a straightforward and recursive 
expression for the mean and covariance. Namely 

= EiXj) = E( ^ XijXi + W,) = Xijii, + vj 

iepa(j) iepa(j) 
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\ \*epa(j) 

= Yl ^^J* ((^fc - ^'k)iX, - 1^,)) + E {{Xk - iJik){Wj - Vj)) 

«Gpa(j) 




and the variance satisfies: 

\ \iepa(i) 
*epa{j) fcepa(j) 

If there are no constraints on the vector z^, there will be no constraints on fi either. Thus, we 
will focus attention on the constraints on the covariance matrix S. If we further assume that the 
■0j are completely unconstrained, this will imply that we can replace the messy expression for 
the covariance ajj by a simple new parameter aj. This leads us to the algebraic representation 
of our model, called the trek rule |17j . 

For each edge i ^ j £ E{G) let \ij be an indeterminate and for each vertex i G V{G) let ai 
be an indeterminate. Assume that the vertices are numerically ordered, that is i — > j G E{G) 
only if i < j. A collider is a pair of edges i ^ k, j ^ k with the same head. For each pair of 
vertices let T{i,j) be the collection of simple paths P in G from i to j such that there is no 
collider in P. Such a colliderless path is called a trek. The name trek come from the fact that 
every colliderless path from i to j consists of a path from i up to some topmost element top(P) 
and then from top(P) back down to j. We think of each trek as a sequence of edges k ^ I. If 
i = j, T{i, i) consists of a single empty trek from i to itself. 

Let (f)G be the ring homomorphism 

(pG ■ C[(Tjj \ l<i<j <n]^ C[ai, Xij \ i,j £ [n\i ^ j G E{G)] 
^ii^ ^ atop(P) • 

P&T{i,j) k^leP 

When i = j, we get an = Oj. If there is no trek in T{i,j), then (f)G{(Tij) = 0. Let Ig = ker(/)(3. 
Since Ig is the kernel of a ring homomorphism, it is a prime ideal. 
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Example 2.2. Let G be the directed graph on four vertices with edges 1 ^ 2, 1 ^ 3, 2 ^ 4, 
and 3^4. The ring homomorphism (pc is given by 

ail ^ «i cri2 ^ aiAi2 0-13 ^ aiAis an ^ aiAi2A24 + aiAi3A34 

0"22 ^ 02 1723 aiAl2Al3 (724 ^ a2A24 + aiAl2Al3A34 

o'sa ^ as 0-34 a3A34 + aiAi3Ai2A24 

(T44 I— > 04 

The ideal Iq is the complete intersection of a quadric and a cubic: 

Ig = (o-llfT23 - (Ti3(T21, Cri20-23C^34 + 0'l30-240"23 + '7l40'22<733 " '7l30'240'33 " 0"l30'220-34 - <7l4'723) • 

Dual to the ring homomorphism is the rational parametrization 

(/)^(a,A) = ( ^ atop(p) • n 

P&T{i,j) k-^leP 

We will often write aij{a,X) to denote the coordinate polynomial that represents this function. 
Let $7 C M^('^)+^(*^) be the subset of parameter space satisfying the constraints: 

Oj > ^ ^ \ji\kiajk{a,\) 

jepa(j) fcepa(j) 

for all z, where in the case that pa(i) = the sum is zero. 

Proposition 2.3. [Trek Rule] The set of covariance matrices in the Gaussian Bayesian network 
associated to G is the image In particular, Iq is the vanishing ideal of the model. 

The proof of the trek rule parametrization can also be found in [17J. 

Proof. The proof goes by induction. First, we make the substitution 

iepa(j) fc6pa(j) 

which is valid because, given the Ajj's, ip'j can be recovered from aj and vice versa. Clearly 
o"!! = ai. By induction, suppose that the desired formula holds for all aij with i,j < n. We 
want to show that has the same formula. Now from above, we have 

^in — ^ ^ '^kn^ik 
fcSpa(n) 

= ^2 ^^"^ ^ «top(P) ■ -Ars 
fcepa(n) PeT{i,k) r^seP 

This last expression is a factorization of <l){akn) since every trek from z to n is the union of a 
trek P £ T{i, k) and an edge k ^ n where k is some parent of n. □ 

The parameters used in the trek rule parametrization are a little unusual because they involve 
a mix of the natural parameters (regression coefficients Ajj ) and coordinates on the image space 
(variance parameters ai). While this mix might seem unusual from a statistical standpoint, 
we find that this parametrization is rather useful for exploring the algebraic structure of the 
covariance matrices that come from the model. For instance: 
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Corollary 2.4. If T is a tree, then It is a toric ideal. 

Proof. For any pair of vertices i,j in T, there is at most one trek between i and j. Thus 4'{(Jij) 
is a monomial and It is a toric ideal. □ 

In fact, as we will show in Section |5] when T is a tree. It is generated by linear forms and 
quadratic binomials that correspond to conditional independence statements implied by the 
graph. Before getting to properties of conditional independence, we first note that these models 
are identifiable. That is, it is possible to recover the \ij and a, parameters directly from S. This 
also allows us to determine the most basic invariant of Ig, namely its dimension. 

Proposition 2.5. The parametrization (j)Q is birational. In other words, the model parameters 
Xij and ai are identifiable and dim/c = ^V{G) + i^E(G). 

Proof. It suffices to prove that the parameters are identifiable via rational functions of the entries 
of S, as all the other statements follow from this. We have = an so the Oj parameters are 
identifiable. We also know that for i < j 

fcepa(i) 

Thus, we have the matrix equation 

^pa(i),i = ^pa(j),pa{j)\a(j)J 

where \a.{j),j is the vector ('^ij)^pa(j)- Since Spa(j)^pa{j) is invertible in the positive definite 
cone, we have the rational formula 

^pa(i)J = ^pa\j),pa(j)^Pa(i),i 

and the Xij parameters are identifiable. □ 

One of the problems we want to explore is the connection between the prime ideal defining the 
graphical model (and thus the image of the parametrization) and the relationship to the ideal 
determined by the independence statements induced by the model. To explain this connection, 
we need to recall some information about the algebraic nature of conditional independence. 
Recall the definition of conditional independence. 

Definition 2.6. Let A, B, and C be disjoint subsets of [n], indexing subsets of the random 
vector X. The conditional independence statement AALB\C ("^ is independent of B given C) 
holds if and only if 

f{xA,XB\xc) = f{xA\xc)f{xB\xc) 

for all xc such that f{xc) 7^ 0. 

We refer to [13] for a more extensive introduction to conditional independence. In the Gauss- 
ian case, a conditional independence statement is equivalent to an algebraic restriction on the 
covariance matrix. 

Proposition 2.7. let A,B,C be disjoint subsets of [n]. Then X ~ AA(/x, S) satisfies the con- 
ditional independence constraint AALB\C if and only if the submatrix '^a\jc,bvjc has rank less 
than or equal to ^C. 
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Proof. If X ~ AA(^,cj), then 

Xaub\Xc = Xc ~ AA (^f^AuB + '^AuB,c'^clc(^C — fJ'C),'^AuB,AuB — '^AuB,c'^c^c'^C,AuB^ ■ 

The CI statement AALB\C holds if and only if {T,aub,Aub — '^Aub,c'^c^c'^c,Aub)a,b = 0. The 

A, B submatrix of Saub.Aub - '>^AyjB,c'^c]c'^c,A'^B is easily seen to be - Tsa,c'^c]c'^c,b 
which is the Schur complement of the matrix 



Since Tic,c is always invertible (it is positive definite), the Schur complement is zero if and only 
if the matrix Tia\jc,bvjc has rank less than or equal to #C. □ 

Given a DAG G, a collection of conditional independence statements are forced on the joint 
distribution by the nature of the graph. These independence statements are usually described 
via the notion of d-separation (the d stands for "directed"). 

Definition 2.8. Let A, B, and C be disjoint subsets of [n]. The set C d-separates A and B if 
every path in G connecting a vertex i £ A and B £ j contains a vertex k that is either 

(1) a non-collider that belongs to C or 

(2) a collider that does not belong to C and has no descendants that belong to C. 

Note that C might be empty in the definition of d-separation. 

Proposition 2.9 ([l3]). The conditional independence statement AALB\C holds for the Bayesian 
network associated to G if and only if G d-separates A from B in G. 

A joint probability distribution that satisfies all the conditional independence statements 
implied by the graph G is said to satisfy the global Markov property of G. The following theorem 
is a staple of the literature of graphical models, that holds with respect to any cj-algebra. 

Theorem 2.10 (Recursive Factorization Theorem). |13| Thm 3.27] A probability density has 
the recursive factorization property with respect to G if and only if it satisfies the global Markov 
property. 

Definition 2.11. Let Gq ^ C[S] be the ideal generated by the minors of S corresponding to 
the conditional independence statements implied by G; that is, 

Cg = ((#C + 1) minors of ^auc,buc I C d-separates A from B in G) . 

The ideal Gg is called the conditional independence ideal of G. 

A direct geometric consequence of the recursive factorization theorem is the following 

Corollary 2.12. For any DAG G, 

V{lG)r\PDn = V{GG)r\PDn. 

In the corollary PDn C M*^ 2 J ig the cone of n x n positive definite symmetric matrices. It 
seems natural to ask whether or not Ig = Cg for all DAGs G. For instance, this was true for 
the DAG in Example 2.2 The Verma graph provides a natural counterexample. 
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Example 2.13. Let G be the DAG on five vertices with edges 1 ^ 3, 1 ^ 5, 2 — > 3, 2 ^ 4, 
3 — > 4, and 4^5. This graph is often called the Verma graph. 

The conditional independence statements implied by the model are all implied by the three 
statements 1_LL2, 1_LL4|{2,3}, and {2, 3}_LL5|{1, 4}. Thus, the conditional independence ideal 
Cg is generated by one linear form and five determinantal cubics. In this case, we find that 
Iq = Cg + (/) where / is the degree four polynomial: 

/ = <723'7240'25<^34 — '722C''250'34 — f'"23<^24'^35 + <7220"240"34<735 

2 2 
— Cr23<^25C44 + '722C250'33'744 + <7230"240"45 — (T220"24C33C45- 

We found that the primary decomposition of Cg is 

Cg = -^G n (crii,(Ti2,cri3,(Ti4) 

SO that / is not even in the radical of Cg- Thus, the zero set of Cg inside the positive semi definite 
cone contains singular covariance matrices that are not limits of distributions that belong to the 
model. Note that since none of the indices of the fXy appearing in / contain 1, / vanishes on 
the marginal distribution for the random vector (X2, X3, X4, X5). This is the Gaussian version 
of what is often called the Verma constraint. Note that this computation shows that the Verma 
constraint is still needed as a generator of the unmarginalized Verma model. □ 

The rest of this paper is concerned with studying the ideals Ig and investigating the circum- 
stances that guarantee that Cg = Ig- We report on results of a computational study in the 
next section. Towards the end of the paper, we study the ideals Ig,o that arise when some of 
the random variables are hidden. 

3. Computational Study 

Whenever approaching a new family of ideals, our first instinct is to compute as many exam- 
ples as possible to gain some intuition about the structure of the ideals. This section summarizes 
the results of our computational explorations. 

We used Macaulay2 |9] to compute the generating sets of all ideals Ig for all DAGs G on n < 6 
vertices. Our computational results concerning the problem of when Cg = Ig summarized 
in the following proposition. 

Proposition 3.1. All DAGs on n < A vertices satisfy Cg = Ig- Of the 302 DAGs on n = 5 
vertices, exactly 293 satisfy Cg = Ig- Of the 5984 DAGs on n = 6 vertices exactly 4993 satisfy 
Cg = Ig- 

On n = 5 vertices, there were precisely nine graphs that fail to satisfy Cg = Ig- These 
nine exceptional graphs are listed below. The numberings of the DAGs come from the Atlas of 
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Graphs [U]. Note that the Verma graph from Example 2.13 appears as A218 after relabehng 
vertices. 



(1) ^139 


1 - 


4, 


1 - 


-5, 


2 - 


--4, 


3 - 


-4, 


4 - 


5. 










(2) Aue 


1 - 


3, 


2 - 


-3, 


2 - 


-.5, 


3 - 


-4, 


4 - 


5. 










(3) Aigr 


1 - 


2, 


1 - 


-3, 


1 - 


-5, 


2 - 


--4, 


3 - 


-4, 


4 - 


5. 






(4) 


1 - 


2, 


1 - 


-4, 


2 - 


-.3, 


2 - 


-.5, 


3 - 


-4, 


4 - 


5. 






(5) ^217 


1 - 


3, 


1 - 


-4, 


2 - 


--4, 


2 - 


-.5, 


3 - 


-4, 


4 - 


5. 






(6) A218 


1 - 


3, 


1 - 


-4, 


2 - 


-.3, 


2 - 


-5, 


3 - 


-4, 


4 - 


5. 






(7) A275 


1 - 


2, 


1 - 


-4, 


1 - 


-.5, 


2 - 


-3, 


2 - 


-5, 


3 - 


-4, 


4 - 


5 


(8) ^277 


1 - 


2, 


1 - 


-3, 


1 - 


-.5, 


2 - 


-4, 


3 - 


-4, 


3 - 


-.5, 


4 - 


5 


(9) A292 


1 - 


-* 2, 


1 - 


-4, 


2 - 


-3, 


2 - 


-.5, 


3 - 


--4, 


3 - 


-.5, 


4 - 


5 



The table below displays the numbers of minimal generators of different degrees for each of the 
ideals Iq where G is one of the nine graphs on five vertices such that Cg 7^ Ig- The coincidences 
among rows in this table arise because sometimes two different graphs yield the same family of 
probability distributions. This phenomenon is known as Markov equivalence |13| I17j. 



Network 


1 


2 


3 


4 


5 


^139 


3 


1 


2 








^146 


1 


3 


7 








^197 





1 


5 





1 


^216 





1 


5 





1 


^217 


2 


1 


2 








^218 


1 





5 


1 





^275 





1 


1 


1 


3 


^277 





1 


1 


1 


3 


^292 





1 


1 


1 


3 



It is worth noting the methods that we used to perform our computations, in particular, 
how we computed generators for the ideals Iq- Rather than using the trek rule directly, and 
computing the vanishing ideal of the parametrization, we exploited the recursive nature of the 
parametrization to determine Ig- This is summarized by the following proposition. 

Proposition 3.2. Let G be a DAG and G\n the DAG with vertex n removed. Then 



Ig 



((Tin- ^ XjnCTij \ i ^ [fl - 1] ) j [^C[fJij | i,j ^ [n]] 
\ iGpa(n) / / 



where the ideal lG\n considered as a graph on n — 1 vertices. 

Proof. This is a direct consequence of the trek rule: every trek that goes to n passes through a 
parent of n and cannot go below n. □ 



Based on our (limited) computations up to n = 6 we propose some optimistic conjectures 
about the structures of the ideals Ig- 
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Conjecture 3.3. 

Ig = Cg: n (l^^-^l 



Conjecture 3.3 says that all the uninteresting components of Cq (that is, the components that 
do not correspond to probability density functions) lie on the boundary of the positive definite 
cone. Conjecture |3.3| was verified for all DAGs on n < 5 vertices. Our computational evidence 
also suggests that all the ideals Ig are Cohen-Macaulay and normal, even for graphs with loops 
and other complicated graphical structures. 

Conjecture 3.4. The quotient ring C\S\/Ig is normal and Cohen-Macaulay for all G. 



Conjecture 3.4 was verified computationally for all graphs on n < 5 vertices and graphs with 



n = 6 vertices and less than 8 edges. We prove Conjecture 3.4 when the underlying graph is a 



tree in Section [5j A more negative conjecture concerns the graphs such that Ig = Cg- 

Conjecture 3.5. The proportion of DAGs on n vertices such that Ig = Gg tends to zero as 
n — > oo. 

To close the section, we provide a few useful propositions for reducing the computation of the 
generating set of the ideal Ig to the ideals for smaller graphs. 

Proposition 3.6. Suppose that G is a disjoint union of two subgraph G = Gi U G2- Then 

Ig = Ig, + Ig, + {<yij I i G V{Gi),j e ^(Ga)) . 

Proof. In the parametrization 0g, we have (pGicij) = if i G ^{Gi) and j £ V{G2), because 
there is no trek from i to j. Furthermore, 4>G{crij) = if i^j ^ ^(Gi) and 4>G{ckl) = 

(pG2i^ki) k,l G V{G2) and these polynomials are in disjoint sets of variables. Thus, there can 
be no nontrivial relations involving both aij and aki- □ 

Proposition 3.7. Let G be a DAG with a vertex m with no children and a decomposition into 
two induced subgraphs G = GiD G2 such that V{Gi) D V{G2) = {m}. Then 

Ig = Ig, + Ig, + {<y^j I i G V{Gi) \ {m],j e V{G2) \ {m}) . 

Proof. In the paremtrization we have (pGi^^ij) = if i € V{Gi) \ {m} and j G V{G2) \ {m}, 
because there is no trek from i to j. Furthermore (j)G{crij) = (f)Gi{crij) if i,j G ^(G'l) and 
4'Gi'^ki) = 4'G2i'^ki) if fc,/ G V{G2) and these polynomials are in disjoint sets of variables unless 
i = j = k = l = m. However, in this final case, (pGi'^mm) = o-m and this is the only occurrence 
of am in any of the expressions 4'Gi'^ij)- This is a consequence of the fact that vertex m has no 
children. Thus, we have a partition of the aij into three sets in which (pG{(^ij) appear in disjoint 
sets of variables and there can be no nontrivial relations involving two or more of these sets of 
variables. □ 

Proposition 3.8. Suppose that for all i £ [n — 1], the edge i ^ n £ E{G). Let G\n be the 

DAG obtained from G by removing the vertex n. Then 

Ig = lG\n ■ C[<Tij : i,i G [n]]. 
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Proof. Every vertex in G \ n is connected to n and is a parent of n. This implies that n cannot 
appear in any conditional independence statement implied by G. Furthermore, if C d-separates 
A from B in G\n, it will d-separate A from B in G, because n is below every vertex in G\n. This 
implies that the GI statements that hold for G are precisely the same independence statements 
that hold for G\n. Thus 

V{Gg) n PDn = V{GG\n • C[aij [n]]) n PDn. 

Since Iq = I{V{Gg) n PDn), this implies the desired equality. □ 

4. Tetrad Representation Theorem 

An important step towards understanding the ideals Iq is to derive interpretations of the 
polynomials in Iq- We have an interpretation for a large part of Iq, namely, the subideal 
Gg ^ Ig- Conversely, we can ask when polynomials of a given form belong to the ideals Iq- 
Clearly, any linear polynomial in Iq is a linear combination of polynomials of the form aij with 
i ^ j, all of which must also belong to Iq- Each linear polynomial a^j corresponds to the 
independence statement XiALXj. Combinatorially, the linear from aij is in Iq if and only if 
there is no trek from i to j in G. 

A stronger result of this form is the tetrad representation theorem, first proven in |17) . which 
gives a combinatorial characterization of when a tetrad difference 

belongs to the ideal Iq- The constraints do not necessarily correspond to conditional indepen- 
dence statements, and need not belong to the ideal Gq- This will be illustrated in Example 

The original proof of the tetrad representation theorem in [T7] is quite long and technical. Our 
goal in this section is to show how our algebraic perspective can be used to greatly simplify the 
proof. We also include this result here because we will need the tetrad representation theorem 
in Section m 

Definition 4.1. A vertex c G V{G) is a choke point between sets I and J if every trek from a 
point in / to a point in J contains c and either 

(1) c is on the /-side of every trek from / to J, or 

(2) c is on the J-side of every trek from / to J. 

The set of all choke points in G between / and J is denoted C{I, J). 

Example 4.2. In the graph c is a choke point between {1,4} and {2,3}, but is not a choke 
point between {1,2} and {3,4}. 
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Theorem 4.3 (Tetrad Representation Theorem [17 ). The tetrad constraint aijaki — crucrjk = 
holds for all covariance matrices in the Bayesian network associated to G if and only if there is 
a choke point in G between {i, k} and {j, /}. 

Our proof of the tetrad representation theorem wih follow after a few lemmas that lead to 
the irreducible factorization of the polynomials crjj(o, A). 

Lemma 4.4. In a fixed DAG G, every trek from I to J is incident to every choke point in 
C{I, J) and they must be reached always in the same order. 

Proof. If two choke points are on, say, the / side of every trek from I to J and there are two 
treks which reach these choke points in different orders, there will be a directed cycle in G. If 
the choke points ci and C2 were on the I side and J side, respectively, and there were two treks 
from I to J that reached them in a different order, this would contradict the property of being 
a choke point. □ 

Lemma 4.5. Let i = cq, ci, . . . , = j be the ordered choke points in C{{i},{j}). Then the 
irreducible factorization of aij{a, X) is 

k 

aij{a,X) = Y{fij{a,X) 

t=i 

where ffj{a,X) only depends on Xpq such that p and q are between choke points q_i and Cf. 

Proof. First of all, we will show that aij{a,X) has a factorization as indicated. Then we will 
show that the factors are irreducible. Define 

flj{a,X)= ^ atop(P) ^i^'- 

PeT{i,j;ct-i,ct) k-^ldP 

where T(i,j; ct-i, ct) consists of all paths from ct-i to ct that are partial treks from i to j (that 
is, that can be completed to a trek from i to j) and Otop(P) = 1 if the top of the partial trek 
P is not the top. When deciding whether or not the top is included in the partial trek, note 
that almost all choke points are associated with either the {i} side or the {j} side. So there is 
a natural way to decide if Otop(P) is included or not. In the exceptional case that c is a choke 
point on both the {i} and the {j} side, we repeat this choke point in the list. This is because c 
must be the top of every trek from i to j, and we will get a factor ffj{a, A) = Uc- 

Since each q is a choke point between i and j, the product of the monomials, one from each 
f^j, is the monomial corresponding to a trek from i to j. Conversely, every monomial arises as 
such a product in a unique way. This proves that the desired factorization holds. 

Now we will show that each of the fij{cL, A) cannot factorize further. Note that every monomial 
in f^j{a,X) is squarefree in all the a and A indeterminates. This means that every monomial 
appearing in f^j{a,X) is a vertex of the Newton polytope of flj[a,X). This, in turn, implies 
that in any factorization f^jia, A) = fg there is no cancellation since in any factorization of any 
polynomial, the vertices of the Newton polytope is the product of two vertices of the constituent 
Newton polytopes. This means that in any factorization /*j(a. A) = fg, f and g can be chosen 
to be the sums of squarefree monomials all with coefficient 1. 
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Now let f-j(a,X) = fg be any factorization and let m be a monomial appearing in f-j(a,X). 
If the factorization is nontrivial m = mfirig where and mg are monomials in / and g 
respectively. Since the factorization is nontrivial and m corresponds to a partial trek P in 
T{i,j;ct-i,ct), there must exist a c on P such that, without loss of generality such that Ape 
appears in mj and Xcq appears in mg. Since every monomial in the expansion of fg corresponds 
to a partial trek from ct-i to ct it must be the case that every monomial in / contains an 
indeterminate \sc from some s and similarly, every monomial appearing in g contains a Acs for 
some s. But this implies that every partial trek from q_i to q passes through c, with the same 
directionality, that is, it is a choke point between i and j. However, this contradicts the fact the 
C{{i},{j}) = {co,...,ct}. □ 

Proof of Thm \4.!^ Suppose that the vanishing tetrad condition holds, that is, 

(^ijO-ki = cTiiakj 

for all covariance matrices in the model. This factorization must thus also hold when we sub- 
stitute the polynomial expressions in the parametrization: 

aij{a, X)aki{a, X) = au{a, X)akj{a, X) . 

Assuming that none of these polynomials are zero (in which case the choke condition is satisfied 
for trivial reasons), this means that each factor /*j(a,A) must appear on both the left and the 
right-hand sides of this expression. This is a consequence of the fact that polynomial rings 
over fields are unique factorization domains. The first factor fl-{a,X) could only be a factor of 
cri/(a, A). There exists a unique t > 1 such that fj- ■ ■ ■ fj- divides ou but flj ■ ■ ■ fl^^ does not 

divide an. This implies that 7*^"^ divides cr^j. However, this implies that q is a choke point 
between i and j, between i and I, between k and j. Furthermore, this will imply that ct is a 
choke point between k and / as well, which implies that ct is a choke point between {z. A:} and 

Conversely, suppose that there is a choke point c between {i,k} and Our unique 

factorization of the aij implies that we can write 

= fi9i,(^ki = f 292, era = /i52,o-fcj = /251 

where /i and /2 corresponds to partial treks from i to c and k to c, respectively, and gi and g2 
correspond to partial treks from c to j and I, respectively. Then we have 

o'ijCTki = /igi/252 = o-iiakj, 

so that S satisfies the tetrad constraint. □ 
At first glance, it is tempting to suggest that the tetrad representation theorem says that 
a tetrad vanishes for every covariance matrix in the model if and only if an associated condi- 
tional independence statement holds. Unfortunately, this is not true, as the following example 
illustrates. 

Example 4.6. Let vlisg be the graph with edges 1— >4, 1^5, 2— > 4, 3^4 and 4^5. Then 
4 is a choke point between {2,3} and {4,5} and the tetrad cr24CT35 — cr25(734 belongs to Iaisq- 
However, it is not implied by the conditional independence statements implied by the graph 
(that is, cr24(T35 — cr250"34 ^ CA139). It is precisely this extra tetrad constraint that forces Ai^g 
onto the list of graphs that satisfy Cq 7^ Ig from Section [3} 
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In particular, a choke point between two sets need not be a d-separator of those sets. In the 
case that G is a tree, it is true that tetrad constraints are conditional independence constraints. 

Proposition 4.7. Let T be a tree and suppose that c is a choke point between I and J in T. 
Then either c d-separates I \ {c} and J \ {c} or d-separates I \ {c} and J \ {c}. 

Proof. Since T is a tree, there is a unique path from an element in / \ c to an element in J \ c. 
If this path is not a trek, we have d-separates / \ {c} from J \ {c}. On the other hand, if this 
path is always a trek we see that {c} d-separates I \ {c} from J \ {c}. □ 

The tetrad representation theorem gives a simple combinatorial rule for determining when a 
2x2 minor of S is in Iq. More generally, we believe that there should exist a graph theoretic 
rule that determines when a general determinant G Ig in terms of structural features of 

the DAG G. The technique we have used above, which relies on giving a factorization of the 
polynomials a (a. A), does not seem like it will extend to higher order minors. One approach 
at a generalization of the tetrad representation theorem would be to find a cancellation free 
expression for the determinant in terms of the parameters and Ajj, along the lines of 

the Gessel-Viennot theorem [8]. From such a result, one could deduce a combinatorial rule for 
when iSyi^sl is zero. This suggests the following problem. 

Problem 4.8. Develop a Gessel-Viennot theorem for treks; that is, determine a combinatorial 
formula for the expansion of in terms of the treks in G. 



5. Fully Observed Trees 

In this section we study the Bayesian networks of trees in the situation where all random 
variables are observed. We show that the toric ideal It is generated by linear forms cjjj and 



quadratic tetrad constraints. The Tetrad Representation Theorem and Proposition 4.7 then 
imply that It = Ct- We also investigate further algebraic properties of the ideals It using the 
fact that It is a toric ideal and some techniques from polyhedral geometry. 

For the rest of this section, we assume that T is a tree, where by a tree we mean a DAG 
whose underlying undirected graph is a tree. These graphs are sometimes called polytrees in 
the graphical models literature. A directed tree is a tree all of whose edges are directed away 
from a given source vertex. 

Since It is a toric ideal, it can be analyzed using techniques from polyhedral geometry. In 
particular, for each i, j such that T(i,j) is nonempty, let aij denote the exponent vector of the 
monomial aij = atop(P) Ylk^i^p ^ki- Let At denote the set of all these exponent vectors. The 
geometry of the toric variety V{It) is determined by the discrete geometry of the polytope 
Pt = conv(^T). 

The polytope Pt is naturally embedded in M^""-^, where n of the coordinates on M^"^-*^ 
correspond to the vertices of T and n — 1 of the coordinates correspond to the edges of T. 
Denote the first set of coordinates by Xi and the second by yij where i — > j is an edge in T. Our 
first results is a description of the facet structure of the polytope Pt- 



Theorem 5.1. The polytope Pt is the solution to the following set of equations and inequalities: 
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Xi>0 for all i G ^(r) 

Vij > for all E{T) 

+ Y.i:i-.j(iE{T)yij - Vjk > for all j ^ k ^ E{T) 

'^Xj + T.i:i^j^E{T)Vii - T.k: j^k&EmVjk > for all j e V {T) . 

Proof. Let Qt denote the polyhedron defined as the solution space to the given constraints. 
First of all, Qt is bounded. To see this, first note that because of the positive constraints and 
the equation ^^i^viT) ~ have that < 1 is implied by the given constraints. Then, 

starting from the sources of the tree and working our way down the edges repeatedly using the 
inequalities Xj + ^^j^^^j-^ Uij — yjk > 0, we see that the Uij are also bounded. 

Now, we have Pt C Qj^, since every trek will satisfy any of the indicated constraints. Thus, 
we must show that Qt C P^. To do this, it suffices to show that for any vector {x^, ip) G Qt, 
there exists A > 0, {x^,y^) and (x^,?/^) such that 

(xO,yO) = A(x\yi) + (l-A)(x2,y2) 

where is one of the 0/1 vectors ajj and (x2,y2) S Qt- Because Qt is bounded, this 

will imply that the extreme points of Qt are a subset of the extreme points of Pt, and hence 
Qt ^ Pt- Without loss of generality we may suppose that all of the coordinates y^^ are positive, 
otherwise the problem reduces to a smaller tree or forest because the resulting inequalities that 
arise when yij = are precisely those that are necessary for the smaller tree. Note that for 
a forest F, the polytope Pp is the direct join of polytopes Pt as T ranges over the connected 



components of F, by Proposition 3.6 



For any fixed j, there cannot exist distinct values ki, k2, and k^ such that all of 

i: i^jeEiT) 

E y^.-y%. = ^ 

V. i^jeE{T) 

^° + E y'^ - yk = 

i: i^jGE(T) 

hold. If there were, we could add these three equations together to deduce that 

+ ^ E y'^j ~ yjki ~ y%2 ~ y%i ~ ^■ 

i: i^jeE{T) 

This in turn implies that 

+ E 4 - y%^ - yk - yk < o 

i: i-^j(LE{T) 

with equality if and only if pa(j) = and = 0. This in turn implies that, for instance, 
y^^^ = contradicting our assumption that y^j > for all i and j. By a similar argument, if 
exactly two of these facet defining inequalities hold sharply, we see that 

+ 2 J2 y% - y%. - y%2 = o 

i: i^jeE{T) 
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which imphes that j has exactly two descendants and no parents. 
Now mark each edge j ^ k in the tree T such that 

^° + E 4 - y% = 0. 

i: i^j(iE{T) 

By the preceding paragraph, we can find a trek P from a sink in the tree to a source in the tree 
and (possibly) back to a different sink that has the property that for no i in the trek there exists 
k not in the path such that i ^ k \s a marked edge. That is, the preceding paragraph shows 
that there can be at most 2 marked edges incident to any given vertex. 

Given P, let (x^,y^) denote the corresponding 0/1 vector. We claim that there is a A > 
such that 

(1) (xO,2/0) = A(x\2/i) + (l-A)(x2,y2) 

holds with G Qt- Take A > to be any very small number and define {x^^y^) by the 

given equation. Note that by construction the inequalities > and yf^ > will be satisfied 
since for all the nonzero entries in the corresponding inequality for [x^,y^) must have 

been nonstrict and A is small. Furthermore, the constraint = 1 is also automatically 

satisfied. It is also easy to see that the last set of inequalities will also be satisfied since through 
each vertex the path will either have no edges, an incoming edge and an outgoing edge, or two 
outgoing edges and the top vertex, all of which do not change the value of the linear functional. 
Finally to see that the inequalities of the form 

are still satisfied by note that marked edges of T are either contained in the path 

P or not incident to the path P. Thus, the strict inequalities remain strict (since they will 
involve modifying by an incoming edge and an outgoing edge or an outgoing edge and the top 
vertex), and the nonstrict inequalities remain nonstrict since A is small. Thus, we conclude that 
Qt C Prpj which completes the proof. □ 

Corollary 5.2. Let -< be any reverse lexicographic term order such that an >- ajk for all i and 
j ^ k. Then in_<(/r) is squarefree. In other words, the associated pulling triangulation of Pt is 
unimodular. 

Proof. The proof is purely polyhedral, and relies on the geometric connections between trian- 
gulations and initial ideals of toric ideals. See Chapter 8 in [19] for background on this material 
including pulling triangulations. Let a-ij denote the vertex of Pt corresponding to the monomial 
(pG^o'ij). For i 7^ j, each of the vertices ajj has lattice distance at most one from any of the 



facets described by Theorem 5.1 This is seen by evaluating each of the linear functionals at the 
0/1 vector corresponding to the trek between i and j. 

If we pull from one of these vertices we get a unimodular triangulation provided that the 
induced pulling triangulation on each of the facets of Pt not containing snj is unimodular. This 
is because the normalized volume of a simplex is the volume of the base times the lattice distance 
from the base to the vertex not on the base. 

The facet defining inequalities of any face of Pt are obtained by taking an appropriate subset 
of the facet defining inequalities of Pt- Thus, as we continue the pulling triangulation, if the 
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current face contains a vertex aij with i ^ j, we will pull from this vertex first and get a 
unimodular pulling triangulation provided the induced pulling triangulation of every face is 
unimodular. Thus, by induction, it suffices to show that the faces of Pt that are the convex 
huU of vertices an have unimodular pulling triangulations. However, these faces are always 
unimodular simplices. □ 

Corollary 5.3. The ring C[T,]/It is normal and Cohen- Macaulay when T is a tree. 

Proof. Since Pt has a unimodular triangulation, it is a normal polytope and hence the semigroup 
ring C^]/It is normal. Hochster's theorem [10' then implies that C[S]//j' is Cohen-Macaulay. 

□ 

While we know that C[S]//r is always Cohen-Macaulay, it remains to determine how the 
Cohen-Macaulay type of It depends on the underlying tree T. Here is a concrete conjecture 
concerning the special case of Gorenstein trees. 

Conjecture 5.4. Suppose that T is a directed tree. Then C[S]/Ij' is Gorenstein if and only if 
the degree of every vertex in T is less than or equal to three. 

A downward directed tree is a tree all of whose edges point to the unique sink in the tree. A 
leaf of such a downward directed tree is then a source of the tree. With a little more refined 
information about which inequalities defining Pt are facet defining, we can deduce results about 
the degrees of the ideals It in some cases. 

Corollary 5.5. Let T he a downward directed tree and let i he any leaf of T, s the sink of T, 
and P the unique trek in T(i,s). Then 

deg It = ^ deg lT\k~,i 

k-^l&P 

where T \ A; — > / denotes the forest ohtained from T hy removing the edge k ^ I. 
Proof. First of all, note that in the case of a downward directed tree the inequalities of the form 

2xj+ ^ - yjk > 

i: i^j&E{T) k: j^k(^E{T) 

are redundant: since each vertex has at most one descendant, it is implied by the the other 
constraints. Also, for any source t, the inequality > is redundant, because it is implied by 
the inequalities xt — ytj > and ytj > where j is the unique child of t. 

Now we will compute the normalized volume of the polytope Pt (which is equal to the degree 
of the toric ideal It) by computing the pulling triangulation from Corollary and relating the 
volumes of the pieces to the associated subforests. 

Since the pulling triangulation of Pt with aj^ pulled first is unimodular, the volume of Pt is 
the sum of the volumes of the facets of Pt that do not contain a^^. Note that a^g lies on all the 
facets of the form 

since through every vertex besides the source and sink, the trek has either zero or two edges 
incident to it. Thus, the only facets that aj^ does not lie on are of the form yki > such that 
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A; — > / is an edge in the trek P. However, the facet of Pt obtained by setting yki 
the polytope PT\k- 



which follows from Theorem 5.1 



is precisely 

□ 



Note that upon removing an edge in a tree we obtain a forest. Proposition 3.6 implies that 
the degree of such a forest is the product of the degrees of the associated trees. Since the degree 



of the tree consisting of a single point is one, the formula from Corollary 5.5 yields a recursive 
expression for the degree of a downward directed forest. 



Corollary 5.6. LetTn he the directed chain withn vertices. ThendeglT^ = ^ 
Catalan number. 



\ n-l I 



the n — lst 



Proof. In Corollary |5.5| we take the unique path from 1 to n. The resulting forests obtained 
by removing an edge are the disjoint unions of two paths. By the product formula implied by 



Proposition 3.6 we deduce that the degree of It„ satisfies the recurrence: 



n-l 



deg /t„ = ^ deg It, ■ deg It„ 



1=1 



with initial condition deg/y^ = 1. This is precisely the recurrence and initial conditions for the 
Catalan numbers [TBJ. □ 



Now we want to prove the main result of this section, that the determinantal conditional 
independence statements actually generate the ideal It when T is a tree. To do this, we will 
exploit the underlying toric structure, introduce a tableau notation for working with monomials, 
and introduce an appropriate ordering of the variables. 

Each variable aij that is not zero can be identified with the unique trek in T from i to j. 
We associate to aij the tableau which records the elements of T in this unique trek, which is 
represented like this: 

aij = [aBi\aCj] 

where B and C are (possibly empty) strings. If, say, i were at the top of the path, we would 
write the tableau as 

aij = [i\iCj]. 

The tableau is in its standard form if aBi is lexicographically earlier than aCj. We introduce a 
lexicographic total order on standard form tableau variables by declaring [aj4|ai?] -< [cCIcZ?] if 
aA is lexicographically smaller that cC, or if aA = cC and aB is lexicographically smaller than 
cD. Given a monomial, its tableau representation is the row-wise concatenation of the tableau 
forms of each of the variables appearing in the monomial. 

Example 5.7. Let T be the tree with edges 1 ^ 3, 1 ^ 4, 2 ^ 4, 3 ^ 5, 3 ^ 6, 4 ^ 7, 
and 4 — > 8. Then the monomial 0"i4CJi80"240"|4CJ380"57(T78 has the standard form lexicographically 
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ordered tableau: 



1 
1 


1/1 

14 


1 


148 


13 


14 


13 


14 


13 


148 


135 


147 


2 


24 


47 


48 



Note that if a variable appears to the d-th power in a monomial, the representation for this 
variable is repeated as d rows in the tableau. □ 

When we write out general tableau, lower-case letters will always correspond to single char- 
acters (possibly empty) and upper case letters will always correspond to strings of characters 
(also, possibly empty). 

Theorem 5.8. For any tree T, the conditional independence statements implied by T generate 
It- In particular, It is generated by linear polynomials dij and quadratic tetrad constraints. 

Proof. First of all, we can ignore the linear polynomials as they always correspond to indepen- 
dence constraints and work modulo these linear constraints when working with the toric ideal 
It- In addition, every quadratic binomial of the form aijcrti — cfuf^kj that belongs to It is im- 
plied by a conditional independence statement. This follows from Proposition 4.7 Note that 
this holds even if the set /c,/} does not have four elements. Thus, it suffices to show that 
It modulo the linear constraints is generated by quadratic binomials. 

To show that It is generated by quadratic binomials, it suffices to show that any binomial in 
It can be written as a polynomial linear combination of the quadratic binomials in It- This, in 
turn, will be achieved by showing that we can "move" from the tableau representation of one 
of the monomials to the other by making local changes that correspond to quadratic binomials. 
To show this last part, we will define a sort of distance between two monomials and show that 
it is always possible to decrease this distance using these quadratic binomials/ moves. This is a 
typical trick for dealing with toric ideals, illustrated, for instance, in |19] . 

To this end let / be a binomial in It- Without loss of generality, we may suppose the terms 
of / have no common factors, because if ■ f £ It then f £ It as well. We will write / as the 
difference of two tableaux, which are in standard form with their rows lexicographically ordered. 
The first row in the two tableaux are different and they have a left-most place where they 
disagree. We will show that we can always move this position further to the right. Eventually 
the top rows of the tableaux will agree and we can delete this row (corresponding to the same 
variable) and arrive at a polynomial of smaller degree. 

Since / E It, the treks associated to the top rows of the two tableaux must have the same 
top. There are two cases to consider. Either the first disagreement is immediately after the top 
or not. In the first case, this means that the binomial / must have the form: 



" abB 


acC 




' abB 


adD ' 
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Without loss of generality we may suppose that c < d. Since f ^ It the string ac must appear 
somewhere on the right-hand monomial. Thus, / must have the form: 



' abB 


acC 


' abB 


adD ' 






aeE 


acC 











If d 7^ e, we can apply the quadratic binomial 



" abB 


adD ' 




' abB 


acC ' 


aeE 


acC 




aeE 


adD 



to the second monomial to arrive at a monomial which has fewer disagreements with the left- 
hand tableau in the first row. On the other hand, if c/ = e, we cannot apply this move (its 
application results in "variables" that do not belong to C[S]). Keeping track of all the ad 
patterns that appear on the right-hand side, and the consequent ad patterns that appear on the 
left-hand side, we see that our binomial / has the form 



abB 
ad* 



ad* 

Since there are the same number of ad's on both sides we see that there is at least one more a 
on the right-hand side which has no d's attached to it. Thus, omitting the excess ad's on both 
sides, our binomial / contains: 



acC 




" abB 


adD ' 




adD' 


acC 


* 




ad* 


* 


* 




ad* 


* 



" abB 


acC 




' abB 


adD 








adD' 


cicC 








aeE 


agG 



with d ^ e or g. We can also assume that c ^ e,g otherwise, we could apply a quadratic move 
as above. Thus we apply the quadratic binomials 

adD' agG 
aeE acC 



and 



adD' 


acC ' 




aeE 


agG 




' gJbB 


adD ' 




aeE 


acC 





abB 
aeE 



acC 
adD 



to reduce the number of disagreements in the first row. This concludes the proof of the first 
case. Now suppose that the first disagreement does not occur immediately after the a. Thus we 
may suppose that / has the form: 



" aAxbB 


aC ' 




g,AxdD 


aE 













Note that it does not matter whether or not this disagreement appears on the left-hand or 
right-hand side of the tableaux. Since the string xd appears on right-hand monomial it must 
also appear somewhere on the left-hand monomial as well. If x is not the top in this occurrence. 
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we can immediately apply a quadratic binomial to reduce the discrepancies in the first row. So 
we may assume the / has the form: 



" aAxbB 


aC 




aAxdD 


aE ' 


xdD' 


xgG 



















If 6 7^ 5 we can apply the quadratic binomial 



' aAxbB 


aC 




' aAxdD' 


aC 


xdD' 


xgG 




xbB 


xgG 



to the left-hand monomial to reduce the discrepancies in the first row. So suppose that g = b. 
Enumerating the xb pairs that can arise on the left and right hand monomials, we deduce, akin 
to our argument in the first case above, that / has the form: 



" aAxbB 


aC 




aAxdD 


aE ' 


xdD' 


xbG 








xhH 


xkK 



















where h and k are not equal to b or d. Then we can apply the two quadratic binomials: 



" xdD' 


xbG 




' xhH 


xbG 


xhH 


xkK 




xdD' 


xkK 



and 



" aAxbB 


aG 




' aAxdD' 


aC 


xdD' 


xkK 




xbB 


xkK 



to the left-hand monomial to produce a monomial with fewer discrepancies in the first row. We 
have shown that no matter what type of discrepancy that can occur in the first row, we can 
always apply quadratic moves to produce fewer discrepancies. This implies that It is generated 
by quadrics. □ 

Among the results in this section were our proofs that It has a squarcfrcc initial ideal (and 
hence C[S]//r is normal and Cohen-Macaulay) and that It is generated by linear forms and 
quadrics. It seems natural to wonder if there is a term order that realizes these two features 
simultaneously. 

Conjecture 5.9. There exists a term order -< such that in^(Jr) is generated by squarefree 
monomials of degree one and two. 

6. Hidden Trees 

This section and the next concern Bayesian networks with hidden variables. A hidden or 
latent random variable is one which we do not have direct access to. These hidden variables 
might represent theoretical quantities that are directly unmeasurable (e.g. a random variable 
representing intelligence), variables we cannot have access to (e.g. information about extinct 
species), or variables that have been censored (e.g. for sensitive random variables in census 
data). If we are given a model over all the observed and hidden random variables, the partially 
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observed model is the one obtained by marginalizing over the hidden random variables. A 
number of interesting varieties arise in this hidden variable setting. 

For Gaussian random variables, the marginalization is again Gaussian, and the mean and 
covariance matrix are obtained by extracting the subvector and submatrix of the mean and 
covariance matrix corresponding to the observed random variables. This immediately yields the 
following proposition. 

Proposition 6.1. Let I C C[//, S] be the vanishing ideal for a Gaussian model. Let HUO = [n] 
be a partition of the random variables into hidden and observed variables H and O. Then 

lo ■■= InC[fii,aij I i,j € O] 

is the vanishing ideal for the partially observed model. 

Proof. Marginalization in the Gaussian case corresponds to projection onto the subspace of pairs 
{^Oi^o,o) ^ I^''^' X ]r( 2 ). Coordinate projection is equivalent to elimination |2]. □ 



In the case of a Gaussian Bayesian network. Proposition 6.1 has a number of useful corollaries, 
of both a computational and theoretical nature. First of all, it allows for the computation of the 
ideals defining a hidden variable model as an easy elimination step. Secondly, it can be used to 



explain the phenomenon we observed in Example 2.13 that the constraints defining a hidden 



variable model appeared as generators of the ideal of the fully observed model. 

Definition 6.2. Let H U O he a partition of the nodes of the DAG G. The hidden nodes H 
are said to be upstream from the observed nodes O in G if there are no edges a ^ h in G with 
o£ O and h e H. 

HUO is an upstream partition of the nodes of G, we introduce a grading on the ring C[a, A] 
which will, in turn, induce a grading on C[S]. Let dega/j = (1,0) for all h € H, degOo = (1,2) 
for all o G O, deg Xho = (0, 1) h £ H and o £ O, and deg Xij = (0, 0) otherwise. 

Lemma 6.3. Suppose that HUO = [n] is an upstream partition of the vertices of G. Then each 
of the polynomials (pdcTij) is homogeneous with respect to the upstream grading and 

( (1,0) ifieHjeH 

deg{aij) = < (1, 1) ifie H,j eO orieOJ e H 
( (1,2) ifieOJeO. 

Thus, Lq is homogeneous with respect to the induced grading on C[S]. 

Proof. There are three cases to consider. If both i, j G if, then every trek in T{i,j) has a top 
element in H and no edges of the form h ^ a. In this case, the degree of each path is the vector 
(1,0). If i G and j G O, every trek from i to j has a top in H and exactly one edge of the 
form h ^ a. Thus, the degree of every monomial in (p^dij) is (1, 1). If both i,j G O, then either 
each trek P from i to j has a top in O, or has a top in H. In the first case there can be no 
edges in P of the form h ^ o, and in the second case there must be exactly two edges in P of 
the form h ^ a. In either case, the degree of the monomial corresponding to P is (1, 2). □ 

Note that the two dimensional grading we have described can be extended to an n dimensional 
grading on the ring C[S] by considering all collections of upstream variables in G simultaneously. 
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Theorem 6.4 (Upstream Variables Theorem). Let HUO be an upstream partition of the vertices 
of G. Then every minimal generating set of Iq that is homogeneous with respect to the upstream 
grading contains a minimal generating set of Ig,o- 

Proof. The set of indeterminates aij corresponding to the observed variables are precisely the 
variables whose degrees lie on the facet of the degree semigroup generated by the vector (1,2). 
This implies that the subring generated by these indeterminates is a facial subring. □ 

The upstream variables theorem is significant because any natural generating set of an ideal 
/ is homogeneous with respect to its largest homogeneous grading group. For instance, every 
reduced Grobner basis if Iq will be homogeneous with respect to the upstream grading. For 
trees, the upstream variables theorem immediately implies: 

Corollary 6.5. Let T he a rooted directed tree and O consist of the leaves ofT. Then It^o is 
generated by the quadratic tetrad constraints 

o'ikO'jl — o-iicrkj 

such that i,j,k,l G O, and there is a choke point c between {i,j} and {k,l}. 



Corollary 6.5 says that the ideal of a hidden tree model is generated by the tetrad constraints 
induced by the choke points in the tree. Sprites et al [IT] use these tetrad constraints as a tool 
for inferring DAG models with hidden variables. Given a sample covariance matrix, they test 
whether a collection of tetrad constraints is equal to zero. From the given tetrad constraints 
that are satisfied, together with the tetrad representation theorem, they construct a DAG that 
is consistent with these vanishing tetrads. However, it is not clear from that work whether or 
not it is enough to consider only these tetrad constraints. Indeed, as shown in jl7| . there are 
pairs of graphs with hidden nodes that have precisely the same set of tetrad constraints that do 



not yield the same family of covariance matrices. Theorem 6.5 can be seen as a mathematical 
justification of the tetrad procedure of Spirtes, et al, in the case of hidden tree models, because 
it shows that the tetrad constraints are enough to distinguish between the covariance matrices 
coming from different trees. 



7. Connections to Algebraic Geometry 

In this section, we give families of examples to show how classical varieties from algebraic 
geometry arise in the study of Gaussian Bayesian networks. In particular, we show how toric 
degenerations of the Grassmannian, matrix Schubert varieties, and secant varieties all arise as 
special cases of Gaussian Bayesian networks with hidden variables. 

7.1. Toric Initial Ideals of the Grassmannian. Let Gr2,n be the Grassmannian of 2-planes 
in C". The Grassmannian has the natural structure of an algebraic variety under the Pliicker 
embedding. The ideal of the Grassmannian is generated by the quadratic Pliicker relations: 

l2,n ■■= I{Gr2,n) = {cTijCTki - CTikCTji + cjuajk \ l<i<j<k<l<n)c C[T.]. 

The binomial initial ideals of l2,n are in bijection with the unrooted trivalent trees with n 
leaves. These binomial initial ideals are, in fact, toric ideals, and we will show that: 
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Theorem 7.1. Let T be a rooted directed binary tree with [n] leaves and let O be the set of 
leaves ofT. Then there is a weight vector G Mv 2 ) and a sign vector r G {±1}*^ ^ ) such that 

It,o = T ■ in<^(/2,n)- 
The sign vector r acts by multiplying coordinate aij by Tjj. 

Proof. The proof idea is to show that the toric ideals It,o have the same generators as the toric 
initial ideals of the Grassmannian that have already been characterized in |T6]. Without loss 
of generality, we may suppose that the leaves of T are labeled by [n], that the tree is drawn 
without edge crossings, and the leaves are labeled in increasing order from left to right. These 
assumptions will allow us to ignore the sign vector r in the proof. The sign vector results from 
straightening the tree and permuting the columns in the Steifel coordinates. This results in sign 
changes in the Pliicker coordinates. 



In Corollary 6.5 we saw that I^q was generated by the quadratic relations 

such that there is a choke point in T between and {A;,/}. This is the same as saying 

that the induced subtree of T on A;,/} has the split {i, j}|{/c, Z}. These are precisely the 
generators of the toric initial ideals of the Grassmannian G2,n identified in [16] . □ 

In the preceding Theorem, any weight vector lo that belongs to the relative interior of the cone 
of the tropical Grassmannian corresponding to the tree T will serve as the desired partial term 
order. We refer to [16] for background on the tropical Grassmannian and toric degenerations of 
the Grassmannian. Since and ideal and its initial ideals have the same Hilbert function, we see 
Catalan numbers emerging as degrees of Bayesian networks yet again. 

Corollary 7.2. Let T be a rooted, directed, binary tree and O consist of the leaves ofT. Then 
deg It,o = T^Cn~2)' ~ 2)-n(i Catalan number. 

The fact that binary hidden tree models are toric degenerations of the Grassmannian has 
potential use in phylogenetics. Namely, it suggests a family of new models, of the same di- 
mension as the binary tree models, that could be used to interpolate between the various tree 
models. That is, rather than choosing a weight vector in a full dimensional cone of the tropical 
Grassmannian, we could choose a weight vector co that sits inside of lower dimensional cone. 
The varieties of the initial ideals V{i'ni^{l2,n)) then correspond to models that sit somewhere 
"between" models corresponding of the full dimensional trees of the maximal dimensional cones 
containing uj. Phylogenetic recovery algorithms could reference these in-between models to indi- 
cate some uncertainty about the relationships between a given collection of species or on a given 
branch of the tree. These new models have the advantage that they have the same dimension 
as the tree models and so there is no need for dimension penalization in model selection. 

7.2. Matrix Schubert Varieties. In this section, we will describe how certain varieties called 
matrix Schubert varieties arise as special cases of the varieties of hidden variable models for 
Gaussian Bayesian networks. More precisely, the variety for the Gaussian Bayesian network will 
be the cone over one of these matrix Schubert varieties. To do this, we first need to recall some 
equivalent definitions of matrix Schubert varieties. 
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Let ti; be a partial permutation matrix, which is an n x n 0/1 matrix with at most one 1 in 
each row and column. The matrix w is in the afiine space C"^". The Borel group B of upper 
triangular matrices acts on C"^" on the right by multiplication and on the left by multiplication 
by the transpose. 

Definition 7.3. The matrix Schubert variety Xy; is the orbit closure of w by the action of B 
on the right and left: 

Xy, = B^wB. 

Let lyj be the vanishing ideal of Xyj. 

The matrix Schubert variety X^ C C"^"^, so we can identify its coordinate ring with a quotient 
of C[aij I i G [n],j £ [n']]. Throughout this section [n'] = {l',2', . . . ,n'}, is a set of n symbols 
that wc use to distinguish from [n] = {1, 2, . . . , n}. 

An equivalent definition of a matrix Schubert variety comes as follows. Let S{w) = {{i, j) \ Wij = 
1} be the index set of the ones in w. For each let Mij be the variety of rank one matrices: 

Mij = {x e C"^" I rankx < 1, Xki = ii k < i or I < j} . 

Then 

X^= 

{i,j)esiw) 

where the sum denotes the pointwise Minkowski sum of the varieties. Since Mij are cones over 
projective varieties, this is the same as taking the join, defined in the next section. 

Example 7.4. Let w be the partial permutation matrix 

/I 0^ 
w = lo 1 
\0 0^ 

Then Xyj consists of all 3 x 3 matrices of rank < 2 and /„, = (|S[3] p/jj). More generally, if w is 
a partial permutation matrix of the form 



w 



Ed 




where E^is a dx d identity matrix, then is the ideal of (d+ 1) minors of a generic matrix. □ 

The particular Bayesian networks which yield the desired varieties come from taking certain 
partitions of the variables. In particular, we assume that the observed variables come in two 
types labeled by [n] = {1,2, ...,n} and [n'] = {1' ,2' , . . . ,n'}. The hidden variables will be 
labeled by the set S (w) . 

Define the graph G{w) with vertex set y = [n] U [n'] U S{w) and edge set consisting of edges 
/c ^ / for all A; < I G [n], k' I' for all k' < I' e [n'], k for all G S{w) and k > i 

and — k' for all G S{w) and k' > j. 

Theorem 7.5. The generators of the ideal Iw defining the matrix Schubert variety Xu, are the 
same as the generators of the ideal lG(w),[n]u[n'] of the hidden variable Bayesian network for the 
DAG G{w) with observed variables [n] U [n']. That is, 

Iw ■ C[(Ty e [n] U [n']] = lG{w),[nMn']- 
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Proof. The proof proceeds in a few steps. First, we give a parametrization of a cone over 
the matrix Schubert variety, whose ideal is naturahy seen to be 1^ • C[aij \ i,j £ [n] U [n']]. 
Then we describe a rational transformation (j) on C[aij \ i,j £ [n] U [n']] such that (j){Iw) = 
^G{w),[n]u[n']- We then exploit that fact that this transformation is invertible and the elimination 
ideal lG{w),[n]u[n'] C[aij \ i £ [n], j £ [n']] is fixed to deduce the desired equality. 

First of all, we give our parametrization of the ideal Iw To do this, we need to carefully 
identify all parameters involved in the representation. First of all, we split the indeterminates 
in the ring C[aij \ i,i £ [n] U [n']] into three classes of indeterminates: those with i,j £ [n], those 
with i,j£ [n'], and those with i £ [n] and j £ [n']. Then we define a parametrization 0^ which 
is determined as follows: 



4>y, : C[t, 7, a. A] C[aij \ i,j £ [n] U [n'] 
Tij ili,j£[n] 
(t>w{(yij) = { lij if e M 

T.{k,i)^S(w):k<i,i<j 0'{k,i)\k,i),i\k,i),j if i e [n],j £ [n'] 
Let = keri;^^. Since the r, 7, A, and a parameters are all algebraically independent, we 
deduce that in J^, there will be no generators that involve combinations of the three types 
of indeterminates in C[<Tjj | i,j £ [n] U [n']]. Furthermore, restricting to the first two types of 
indeterminates, there will not be any nontrivial relations involving these types of indeterminates. 
Thus, to determine Jw, it suffices to restrict to the ideal among the indeterminates of the form 
CTij such that i £ [n] and j £ [n']. However, considering the parametrization in this case, we see 
that this is precisely the parametrization of the ideal Iw, given as the Minkowski sum of rank 
one matrices. Thus, Jw = Iw 

Now we will define a map from (p '■ C[a"jj] — > C[cTij] which sends Jw to another ideal, closely 
related to lG{w),[n]u[n']- To define this map, first, we use the fact that from the submatrix 
S we can recover the Xij and Oj parameters associated to [n], when only considering the 
complete subgraph associated to graph G(w)r„i (and ignoring the treks that involve the vertices 



{k,l) £ S{w)). This follows because these parameters are identifiable by Proposition 2.5 A 
similar fact holds when restricting to the subgraph G{w)^^ij. The ideal J^, we have defined thus 
far can be considered as the vanishing ideal of a parametrization which gives the complete graph 
parametrization for G{w)[n] and G^w)^^'] and a parameterization of the matrix Schubert variety 
Xw on the (Tij with i £ [n] and j £ [n']. So we can rationally recover the A and a parameters 
associated to the subgraphs G{w)[n] and G{w)[n']- 

For each j < k pair in [n] or in [n'], define the partial trek polynomial 

k—j m 
m=l j=lQ<ch<...<lm=k i=l 

We fit these into two upper triangular matrices S and S' where Sjk = sjk if j < /c with j, k £ [n], 
Sjj = 1 and Sjk = otherwise, with a similar definition for S' with [n] replaced by [n']. Now 
we are ready to define our map. Let cp be the rational map : C[S] C[S] which leaves aij 
fixed if i, j £ [n] ot i,j £ [n'], and maps aij with i £ [n] and j £ [n'] by sending 
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This is actually a rational map, because the Xij that appear in the formula for sjk are expressed 
as rational functions in terms of the aij by the rational parameter recovery formula of Proposition 



2.5 Since this map transforms S [„] [„/] by multiplying on the left and right but lower and upper 
triangular matrices, this leaves the ideal J«, H C[aij \ i G [n], j £ [n']] fixed. Thus Jw C (f)[Jw). 
On the other hand (j) is invertible on so Jw = (j){Jw)- 

If we think about the formulas for the image 0o(/)^, we see that the formulas for aij with i G [n] 
and j G [n'] in terms of parameters are the correct formulas which we would see coming from 
the parametrization (pG{w)- On the other hand, the formulas for aij with i,j E [n] or i,j E [n'] 
are the formulas for the restricted graph G^^] ^-^id G^^'] > respectively. Since every trek contained 
in or is a trek in G{w), we see that the current parametrization of is only "almost 
correct", in that it is only missing terms corresponding to treks that go outside of G{w)[n] or 
G{w)[n']- Denote this map by ipw, and let (j)G(w) be the actual parametrizing map of the model. 
Thus, we have, for each aij with i,j E [n] or i,j E [n'], 4>G{w){'^ij) = '4'wicrij) + r^iaij), where 
i^wi^Tij) is a polynomial remainder term that does not contain any Oj with z E [n] U [n'], when 
i,j E [n] or i,j E [n'], and ryj{aij) = otherwise. On the other hand, every term of ipwio'ij) will 
involve exactly one with i E [n] U [n'], when i,j E [n] or i,j E [n']. 

Now we define a weight ordering -< on the ring C[a, A] that gives deg Oj = 1 if « E [n] U [n'] and 
deg Oj = otherwise and deg = for all i, j. Then, the largest degree term of (t>G(w)[<^ij) with 
respect to this weight ordering is ipw{a). Since highest weight terms must all cancel with each 
other, we see that / E lG(w)\n]vj[n']^ implies that / E Jw Thus, we deduce that lG{w)\n]vj[n'] ^ Jw- 
On the other hand, 

lG{w)\n]vj[n]' n C[aij I i E [n], j E [n]] = n C[aij | i E [n], j E [n']] 

and since the generators of n C[o"jj \i E [n],j E [n']] generate Jyj, we deduce that J^j C 
-^G(«)),[n]u[n'] which completes the proof. □ 



The significance of Theorem 7.5 comes from the work of Knutson and Miller They gave 
a complete description of antidiagonal Grobner bases for the ideals Iw Indeed, these ideals 
are generated by certain subdeterminants of the matrix j^^]. These determinants can be 
interpretted combinatorially in terms of the graph G{w). 



Theorem 7.6. |TT] The ideal Iw defining the matrix Schubert variety is generated by the con- 
ditional independence statements implied by the DAG G{w). In particular, 

Iw = + 1 minors ofTiA,B \ A <Z [n],B C. [n], G C S{w), and G d-separates A from i?) . 

7.3. Joins and Secant Varieties. In this section, we will show how joins and secant varieties 
arise as special cases of Gaussian Bayesian networks in the hidden variable case. This, in turn, 
implies that techniques that have been developed for studying defining equations of joins and 
secant varieties (e.g. |12| I20j) might be useful for studying the equations defining these hidden 
variable models. 

Given two ideals / and J in a polynomial ring K[x\ = K[xi, . . . ,Xm], their join is the new 
ideal 

I*J:= (/(y) + J(z) + {xi-y,-z,\ie [m])) P|C[x] 
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where /(y) is the ideal obtained from / by plugging in the variables yi, . . . , y-m for xi, . . . , Xm- 
The secant ideal is the iterated join: 

/{'•} = /*/*•••*/ 

with r copies of /. If / and J are homogeneous radical ideals over an algebraically closed field, 
the join ideal / * J is the vanishing ideal of the join variety which is defined geometrically by 

the rule 

V{I*J) = V{I)*V{J)= U IJ <a,6> 

aeV(I) b&V{J) 

where < a,b > denotes the line spanned by a and b and the bar denotes the Zariski closure. 

Suppose further that / and J are the vanishing ideals of parametrizations; that is there are 
(j) and tp such that 

(p : C[x] ^ C[^] and ^ : C[x] ^ C[r]] 
and I = ker (p and J = ker ip. Then I * J is the kernel of the map 

Xi ^ (j){Xi) + tpixi). 

Given a DAG G and a subset K C V{G), Gk denotes the induced subgraph on K. 

Proposition 7.7. Let G be a DAG and suppose that the vertices of G are partitioned into 
V{G) = O U Hi U H2 where both Hi and H2 are hidden sets of variables. Suppose further that 
there are no edges of the form oi 02 such that 01,02 G O or edges of the form hi /12 or 
/i2 hi with hi £ Hi and /12 G H2. Then 

Ig,o = Igouh^,0 * Igouh2,o- 

The proposition says that if the hidden variables are partitioned with no edges between the 
two sets and there are no edges between the observed variables the ideal is a join. 

Proof. The parametrization of the hidden variable model only involves the aij such that i,j £ O. 
First, we restrict to the case where i ^ j. Since there are no edges between observed variables 
and no edges between Hi and H2, every trek from i to j involves only edges in GquHi or only 
edges in GouH2- This means that 

and these summands are in non-overlapping sets of indeterminates. Thus, by the comments 
preceding the proposition, the ideal only in the aij with i 7^ j E O is clearly a join. However, 
the structure of this hidden variable model implies that there are no nontrivial relations that 
involve the diagonal elements an with i £ O. This implies that Ig,o is a join. □ 

Example 7.8. Let Kpm be the directed complete bipartite graph with bipartition H = [p'] 
and O = [m] such that i' ^ j £ E{Kp^rn) for all ^' G [p'] and j £ [m]. Then Kp^^a satisfies the 
conditions of the theorem recursively up to p copies, and we see that: 

T - 7-^P> 

This particular hidden variable Gaussian Bayesian network is known as the factor analysis model. 
This realization of the factor analysis model as a secant variety was studied extensively in |3j. 



ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 

Example 7.9. Consider the two "doubled trees" pictured in the figure. 



29 





Since in each case, the two subgraphs GquHi and G0UH2 ^-re isomorphic, the ideals are secant 
ideals of the hidden tree models It,o for the appropriate underlying trees. In both cases, the 



ideal /. 



{2} 
T,0 



Ig,o is a principal ideal, generated by a single cubic. In the first case, the ideal 



is the determinantal ideal Jr 
eight term cubic 



{2} 



T 



(IS 



123,456 



I). In the second case, the ideal is generated by an 



Ig,0 = (0'130"250"46 - 0"13C^260'45 - fl4(T25<736 + '7l40'26'735 

+0'150'230"46 - 0'15CJ24(736 " '7l6C''23Cr45 + <7i60-240"35) • 



□ 



In both of the cubic cases in the previous example, the ideals under questions were secant 
ideals of toric ideals that were initial ideals of the Grassmann-Pliicker ideal, as we saw in Theorem 
Note also that the secant ideals 4% are, in fact, the initial terms of the 6 x 6 Pfaffian with 
respect to appropriate weight vectors. We conjecture that this pattern holds in general. 

Conjecture 7.10. Let T be a binary tree with n leaves and O the set of leaves of T. Let 
l2,n be the Grassmann-Pluiicker ideal, let oo be a weight vector and t a sign vector so that 
T ■ in^(/2,n) IS in Theorem 



It,o 



7.1 



Then for each r 



L 



T,0 
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